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Three different finite field multipliers are presented: (1) a dual basis multiplier due to 
Berlekamp , (2) the Massey -Omura normal basis multiplier, and (3) the S cot t-Tav ares- 
Peppard standard basis multiplier. These algorithms are chosen because each has its own 
distinct features which apply most suitably in different areas. Finally , they are imple- 
mented on silicon chips with NMOS technology so that the multiplier most desirable for 
VLSI implementations can readily be ascertained. 


I. Introduction 

The era of the VLSI digital signal processor has arrived and 
its impact is evident in many areas of technology. The trend is 
to put more and more elements on a single silicon chip in order 
to enhance the performance and reliability of the system. 

Recently, finite field arithmetic has found widespread 
ipplications. Examples include cryptography, coding theory, 
ind computer arithmetic. Among the finite field arithmetic 
)perations, multiplication is the most complex and time con- 
iuming. Already, it is used in a variety of systems, e.g., the 
/LSI design of Reed-Solomon coder (Refs. 1 and 2). Hence, 
i small, high performance finite field multiplier is urgently 
leeded. Such a multiplier can also be used as a building block 
or the design of many large systems which use finite field 
irithmetic. 


In this article, three different finite field multipliers are 
compared for suitability of VLSI implementation. These 
include: (1) the dual basis multiplier due to Berlekamp (Ref. 3), 

(2) the Massey-Omura normal basis multiplier (Ref. 4), and 

(3) the Scott-Tavares-Peppard standard basis multiplier (Ref. 5). 
They are chosen for comparison because each has its own 
distinct features which make them suitable for specific appli- 
cations. 

Different basis representations of field elements are used in 
these three multipliers. The dual basis multiplier uses the dual 
basis representation for the multiplicand and standard basis for 
the multiplier. The product is again in dual basis representa- 
tion. The Massey-Omura multiplier uses normal basis represen- 
tations for both the multiplicand and multiplier. The Scott- 
Tavares-Peppard multiplier uses the standard basis representa- 
tions for all field elements. The complexity of basis conversion 
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is heavily dependent on the choice of the primitive irreducible 
polynomial which generates the field. If the polynomial is 
chosen adequately, the basis conversion is a simple operation. 
The algorithms for performing the basis conversions are pre- 
sented in Appendixes A and B. 

It is concluded in this article that the dual basis multiplier 
needs the least number of gates, which in turn leads to the 
smallest area required for VLSI implementation. The Massey - 
Omura multiplier is very effective in performing operations 
such as finding inverse elements or in performing squaring or 
exponentiation of a finite field element. The standard basis 
multiplier does not require basis conversion; hence it is readily 
matched to any input or output systems. Also, due to its 
regularity and simplicity, the design and expansion to high 
order finite fields is easier to realize than in the dual or normal 
basis multipliers.’ 

Examples of these three multipliers are given in this article 
for the purpose of illustration. Their 8-bit versions are imple- 
mented on a silicon chip. The chip layouts are also presented 
separately so that their differences can be distinguished with- 
out difficulty. 

II. The Dual Basis Multiplication Algorithm 

Recently, Berlekamp developed a bit-serial multiplier for 
use in the design of a Reed-Solomon encoder (Ref. 3). Hsu 
etal. (Ref. 1) used Berlekamp’s multiplier to design an 8-bit 
single chip VLSI (255,223) Reed-Solomon encoder which has 
proved to perform well. However, in that design, the multipli- 
cand is a fixed finite field constant which is inconvenient if 
one desires to change the multiplicand. 

In the following, Berlekamp’s bit-serial multiplication algo- 
rithm is modified and generalized to allow both the multipli- 
cand and multiplier to be variable. Thus, revision of Berlekamp’s 
algorithm is called the dual basis multiplication algorithm in 
the rest of this article. 

In order to understand the dual basis multiplication algo- 
rithm, some mathematical preliminaries are needed. Toward 
this end, the mathematical concepts of the “trace” and a 
“dual” basis are introduced. For more details and proofs see 
Refs. 2, 6, and 7. 

Definition 1 . The trace of an element ft belonging to GF(p m ), 
the Galois field, of p m elements, is defined as follows: 


In particular, for p- 2, 

m_1 k 

Tr(0) = £ P 2 

k=0 

A fast algorithm for computing trace values of elements ii 
GF(2 m ) is presented in Appendix C. The trace has the follow 
ing properties which will not be proved here: 

(1) [Tr(/3 )] P = 0P + 0P 2 + ■ • • + j3 P m_1 = Tr(/3) 

where j3 e GF(p m ). This implies that Tr(/3) e GF(p) 
i.e., the trace is in the ground field GF(p). 

(2) Tr(j3 + T ) = Tr(/3) + Tr( 7 ), where ft, ye GF(p). 

(3) Tr(c /3) - cTr(]3), where c e GF(p) 

(4) Tr(l) = m (mod p) 

Definition 2. A basis {w fc } in GF(p m ) is a set of m linearb 
independent elements in GF(p m ). 

Definition3. Two bases {Uj} and {X k } are said to be th< 
dual of one another if 

1 , if j = k 
0, if j ^k 

For convenience, the basis {tij} is sometimes called the origins 
basis, and the basis {X k } is called its dual basis, even though 
the concept of duality is symmetric. 

Theorem. Every basis has a unique dual basis. 

Proof. See Ref. 8. 

Corollary 1. Let {uj} be a basis of GF(p m ) and let {A^} b 
its dual basis. Then a field element Z can be expressed in th 
dual basis {X k } by the expansion 

m —1 

z = E Z A 

k = 0 

where 

= Tr (Z • u k ) 

Proof. Let 


m - 1 


Tr(/5) = £ P p 




Z Z o\ + Z l\ + ’ * ‘ + Z m-1 
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be represented in dual basis. Also let 


Multiply both sides by u k and take the trace. Then by 
Definition 3 and the property of trace: 

m -1 

Tr(Z • u k ) = Tr £ z. (X.u fc ) = z fe 

i~0 

The following corollary is an immediate consequence of 
Corollary 1 . 

Corollary 2. Let {w ; } be a basis of GF(p m ) and let {\} be 
its dual basis. The product W = ZG of two fixed elements in 
GF(p m ) can be expressed in the dual basis by the expansion 

m — 1 m -1 

E Tr(»^)-\ = E Tr(ZGw ¥ X * 

/f =0 fc =0 

where Tr (Wu k ) is the frth coefficient of the dual basis for the 
product of two field elements. 

These two corollaries provide a theoretical basis for the 
dual basis finite field multiplier. In the following section, a 
detailed example is developed to illustrate the dual basis bit- 
serial multiplication algorithm . 

III. An Example of the Dual Basis 
Multiplication Algorithm 

The example is given in GF( 2 4 ) for purposes of illustration; 
the extension to more general cases is obvious. 

Let a be a root of the primative irreducible polynomial 
f(x) = x 4 + x + 1 over GF{ 2). Then a satisfies the equation 
a 4 + a+ 1 = 0. It is also true that a 15 = 1 . Let the standard 
basis be { 1 , a, a 2 , a 3 } and its dual basis be X 3 }. 

Then, by Definition 3 , 

Tr ( 1 • \) = 1 
Trfa-Xj) = 1 
Tr(a 2 • Xj) = 1 
and 

Tr(a 3 • X 3 ) = 1 
Let 

z = E 

k=0 


3 

G = E *k “* 

k-0 

be represented in standard basis. Then, by Corollary 1 , 

Z k = Tr(Za*) 

Furthermore, let W = ZG be the product of two elements Z 
and G. If W is represented in dual basis, then, 

k = 0 

where by Corollary 1 , 

co k = T^Hv^) = Tr(ZG • \) 

If one defines 

T (k) (W) = Tr (ZG • X k ) 

then 

T (0) (HO = Tr(ZGa°) = Tr (ZG) 

= Tr(Z (g Q • a 0 +£j • a + g 2 • a 2 + g 3 • a 3 )) 

= ¥o + ¥i + ¥ 2 + ¥ 3 (i) 

From the definition of T^F) (HO, one obtains 

jiF (HO = Z<*" 1) (aZG • a*' 1 ) 

Therefore, if ZG is replaced by a. • ZG in i.e., Z by 

a • Z, 7^ (IV) can be obtained from T^ k ~^ (W). Let 

y = «Z = y^+y^ + y 2 \ + y 3 X 3 

where 

y m = Tr(T • a m ) = Tr(Z-a m+1 ) 
for each m. Then is obtained from by replacing 

z o fe y >o = rx(Za) = Zj 
^ by ^ = Tr(Za 2 ) = z 2 
z 2 by y 2 = Tr(Za 3 ) = z 3 
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and 


* 3 by y 3 = Tr(Za 4 ) = z Q + z. 

To reiterate, 

3 

W = ZG = £ ^ W \ 

/c=0 

can be computed as follows: 

(1) Initially for k- 0, compute 

7< 0) (WO by Eq.(l). 

(2) For k = 1, 2, 3, compute (W) by 

7 ,(fc - 1) (w) = (YG) 

and 

r = OZ = z 0 \, + zA 

with 

y 0 = Vl = W 2 = Z 3 ’ 3nd y 3 = Z 0 +Z 1 = Z f 

where = z 0 + Zj is the feedback term of the 
algorithm. 

The above algorithm illustrates the dual basis multiplication 
algorithm. The extension to an 8-bit multiplier is obvious and 
will not be included here. The primitive irreducible polynomial 
in the 8-bit design is chosen to be 

fix) = X 8 + JC 4 + Jt 3 + X 2 +1 

In this case, the feedback term is 


Figure 1 shows the logic diagram of an 8-bit dual basis 
multiplier. The architecture is composed of four blocks. In 
Fig. 1 , the Z-serial-to-parallel unit performs the serial-to-parallel 
operation of the input element with dual basis representation. 
Once all 8 bits of this element are stored in the bottom register, 
the cyclic operation starts and one bit is fed back from the 
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feedback logic circuitry. The G-serial-to-parallel unit also per- 
forms the serial-to-parallel operation of the input standard 
basis element. Once all 8 bits of this element are stored in the 
bottom register, they are latched bitwise so that no furthei 
operations are performed on this element as required by the 
algorithm. 

Next, the output bits of these two units are fed into the 
AND-generation unit. The output consists of the bitwise 
AND-ed terms. These AND-ed terms again are fed into the 
XOR-array unit which performs the addition of AND-ed 
terms. This is needed since the addition of two elements in 
GF( 2) is just an exclusive-OR (XOR) operation. The terms 
included in this XOR-array are as shown in the following: 

V + V + *2*2 + V + V + V + V6 + V 

The product is then obtained from the output of this XOR- 
array bit by bit. Figure 2 shows the layout of this dual basis 
multiplier. 


IV. VLSI Architecture for the Massey-Omura 
Multiplier 

Recently, Massey and Omura developed a multiplier which 
obtains the product of two elements in the finite Field GF( 2 m ) 
In this invention, they utilize a normal basis of form {a, a 2 
a 4 , . . . to represent each element in the field, where 

a is the root of an irreducible polynomial of degree m ovei 
GF{ 2). In this basis, each element in the field GF(2 m ) can be 
represented by m binary digits. 

Using the normal basis representation, the squaring of ar 
element in GF(2 m ) is readily shown to be a simple cyclic 
shift of its binary digits (Ref. 4). Multiplication of two ele 
ments with a normal basis representation requires the same 
logic circuitry for every product digit. Adjacent product digil 
circuits differ only in their inputs which are cyclically shifted 
versions of one another (Ref. 4). 

The conventional method for finding an inverse element ir 
a finite field uses either table look-up or Euclid’s algorithm 
These methods are not easily realized in a VLSI circuit. How 
ever, by using a Massey-Omura multiplier, a recursive, pipeline 
inversion circuit can be developed (Ref. 4). The details of the 
Massey-Omura multiplier algorithm are not discussed furthei 
here. For a more detailed discussion, see Ref. 4. 

The function / as described in (Ref. 4) is chosen to be the 
following: 
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f( a 0' a i’ a 2' a 3' a 4’ a s’ a 6> a i’ b 0’ b l> b 2' b 3' b 4' b S' b 6' b l) = 
a 5 b 0 + a 6 b 0 + °3 b l +a 5 b l + * 4^2 + a 5 b 2 + * 6*2 + * 7*2 

+ * 1*3 + * 4*3 + * 2*4 + * 3*4 + * 0*5 + *1 *1 + * 2*5 

+ * 6*5 + * 0*6 + * 2*6 + *5 *6 + * 6*6 + * 2*7 

where the primitive irreducible polynomial of this finite field 

is 

£(*) = X 8 +x 7 +jc 6 +* 1 + 1 

There are a variety of different possible expressions for the 
function/; however, the above one was chosen for the purpose 
of illustration. 1 Since each term in the above expression repre- 
sents a conducting line in the AND portion of a PLA (pro- 
grammable logic array), the fewer the number of terms there 
are, the smaller the area needed to be used for VLSI imple- 
mentation. 

Figure 3 shows the block diagram of an 8 -bit finite field 
multiplier using the Massey-Omura normal basis algorithm. 
The architecture of this chip is identical to that of the dual 
basis multiplier. The differences between these two multi- 
pliers are the following: 

(1) The number of terms in the expression of the Massey- 
Omura multiplier is twenty-one, while in the dual basis 
multiplier it is only eleven. This means a substantial 
amount of area is saved in the dual basis multiplier over 
the normal basis multiplier. 

(2) Both the input serial-to-parallel units are identical in 
the Massey-Omura multiplier and no feedback is needed. 
On the other hand, in the dual basis multiplier, the 
register storing the element with standard basis repre- 
sentation does not need to be cyclically shifted. This 
field element remains latched in the same position. 

Figure 4 shows the layout for the 8-bit Massey-Omura 
multiplier. 

V. VLSI Architecture for the Standard Basis 
Multiplier 

The Scott-Tavares-Peppard multiplication algorithm is 
serial-in, serial-out, and pipeline in architecture. This algorithm 
performs multiplication in GF(2 m ) with order 0 (m) in both 


J Wang, C. C., “Computer Simulation of Finite Field Multiplications 
Based on Massey-Omura *s Normal Basis Representation of Field Ele- 
ments,*’ private communication, 1985. 
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computation time and implementation area, but requires 
m + 1 time units between the first-in and first-out of compu- 
tation. Due to the regularity of this architecture, the expan- 
sion to higher order finite fields needs only replicas of a basic 
cell. Furthermore, the irreducible primitive polynomial which 
generates the finite field can be changed. This feature makes it 
more convenient in use. This algorithm performs the finite 
field multiplication with elements represented in standard 
basis. As a consequence no basis conversion is needed. This 
multiplier can be used for applications such as crypotography 
where m is large. The algorithm is advantageous because of 
its efficient implementation time and high throughputs. The 
detailed algorithm will not be discussed further here. For more 
details, see Ref. 5. 

Figure 5 shows the logic diagram of an 8-bit standard basis 
multiplier by Scott, Tavares, and Peppard. Inputs to this chip 
are A and B , the two elements to be multiplied, and the 
irreducible primitive polynomial F. These are fed into the 
chip serially. The output is the product element P, which is 
shifted out bit-by-bit. 

In Fig. 5, A and F are shifted into their respective registers 
serially bit-by-bit. Here A is the multiplicand and F is the 
primitive irreducible polynomial that generates the finite 
field. The multiplier is denoted by B and the product by P. 
The register P. contains the immediate product. Two control 
signals are required. One is derived from the most significant 
bit (MSB) of P , and the other from the state of b (i which is 
latched with a flip-flop. The left shift is performed by loading 
the output of cell CELL-/ into the product register of cell 
CELL-/ + 1 . Once the multiplication is completed, the most 
significant bits of the product register are transferred to the 
output shift register and shifted out serially. 

The circuit diagram of the /th cell CELL-/ is shown in 
Fig. 6. Since the ground field is GF( 2), additions are per- 
formed by exclusive-OR (XOR) gates. Pass transistors are used 
to control the data flow. If a “0” is to be added, the input 
line to the XOR gate is grounded; otherwise A and/or Fare 
passed. The output of the XOR gate is directed to the product 
register of the next stage so adding and shifting is done- in one 
clock cycle. Figure 7 shows the layout of this 8-bit standard 
basis finite field multiplier. 


VI. Concluding Remarks 

Three finite field multipliers are compared here. They are 
dual basis multiplier, normal basis multiplier, and standard 
basis multiplier. The dual basis multiplier occupies the smallest 
amount of chip area in VLSI implementation if the basis con- 
version is not included. Furthermore, since the dual basis multi- 
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plier performs multiplication by taking the inner product of 
two elements and then feeds back the sum of certain bits of 
one element, it is expected that as the order of field goes 
higher, the dual basis multiplier will outperform the others. 
The normal basis multiplier is very effective in performing 
operations such as finding the inverse element or in performing 
squaring or exponentiation of a finite field element. But the 
area grows dramatically as the order of field goes up. Also, the 
/ function described in Ref. 4 is to be searched again by 


computer as the field is changed, and it is usually very time 
consuming. The standard basis does not require basis conver- 
sion; hence it is readily matched to any input or output 
system. Also, due to its regularity and simplicity, the design 
and expansion to high order finite fields are easier to realize 
than the dual or normal basis multipliers. The irreducible 
primitive polynomial of the field is changeable in standard 
basis multiplier. This distinct feature makes it more useful in 
certain aspects. 
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Fig. 1. Logic diagram of an 8-bit dual basis finite 
field multiplier 
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Fig. 3. Block diagram of an 8-bit finite field multiplier 
using Massey-Omura’s normal basis algorithm 



Fig. 4. Layout of an 8-bit Massey-Omura finite field multiplier 
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Appendix A 

A Method for Converting an Element in Standard Basis to Dual Basis 

z - E 

k~0 


In this appendix, a method for converting an element repre- 
sented in standard basis to its counterpart in dual basis is 
described by example. First, let the irreducible primitive poly- 
nomial in GF( 2 8 ) be 

f(X) = x 8 + x 4 + x 3 + x 2 +1 

Then, from the definition of trace, one obtains 

Tr(l) = 0, Tr(a) = 0,Tr(a 2 ) = 0,Tr(a 3 ) = 0, 

Tr(a 4 ) = 0, Tr(a 5 ) = l,Tr(<* 6 ) = 0,Tr(a 7 ) = 0 

where a satisfies the equation x 8 + x 4 + x 3 + x 2 + 1 = 0. 

An element Z in standard basis is written as 

z = E a * 

k=0 

In dual basis, it is represented as 


where 

z' k = Tr (Zct k ) 

= Tr((z 0 a° + ZjO; 1 + z 2 « 2 + z 3 a 3 
+ z 4 « 4 + z 5 a 5 + z 6 a 6 + z ? a 7 ) a k ) 

= z Q Tr(a*) + z, Tr(« fc+1 ) + z 2 Tr(a fc+2 ) 

+ z 3 Tr(a* +3 ) + z 4 TrCa* 44 ) + Tr(a fc+S ) 

+ z 6 Tr(a fc+6 ) + z ? Tr(a t+7 ) 

Therefore, once Tr(a*), for 0 < k < 14, are known, the basis 
conversion from standard to dual can be completed. 
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Appendix B 

A Method for Converting an Element in Dual Basis to Standard Basis 


This appendix describes a method for converting an element 
represented in dual basis to standard basis. Again, let an ele- 
ment Z in dual basis be written as 

k = 0 

In standard basis let it be represented as 

2 = £ **«* 

k=Q 

From the definition of the trace, one obtains 


Z k = Tr(ZA fc ) 

= Tr(( Z o\) + z[ \ +4^ +Z 3 X 3 

+ Z 4 X 4 +2 S X 5 +Z 6 X 6 +Z 7 X 7)\) 

= z ' 0 Tr(X 0 A fc ) + Z ; Tr(Aj \) + z ' 2 

+ Z3 Tr(Aj \) + Z4 Tr(A 4 A fc ) + z' Tr(A s \) ^ 

+ z' 6 ?r(\\) + z;U\\) 

Hence, if the dual basis is determined and the trace values 
of the above calculated, the basis conversion from dual basis to 
standard basis can be completed. 
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Appendix C 

Fast Algorithm for Calculating Trace Values of Elements in GF(2 m ) 


In this appendix, a fast algorithm for calculating the trace 
lues of elements in finite field GF(2 m ) is described. From 
e definition of the trace one has, for 0, 0 2 e GF(2 m ): 

m — 1 

Tr(jS 2 ) = ^ (0 2 ) 2 = 0 2 + + • ■ ■ + p 2 ™ = 0 + 0 2 

k=0 

+ ... + p 2 m ~ l = Tr(0) 

Hence, if Tr(j3) is obtained, then Tr(/3 2 ) can also be obtained 
;hout calculation. 

Since every element in GF(2 m ) can be represented by the 
ments which compose the basis, he., for GF( 2 m ), and the 


basis is {a 0 , a 1 , a 2 , a 3 , a 4 , a 5 , a 6 , a 7 }, then 0 can be written 
as 

0 = 0 o a ° + 0! a ' + 0 2 “ 2 + 0 3 a 3 + 0 4 <* 4 + ft.a 5 + /3 6 <* 6 + /3 ? a 7 
From the properties of the trace, one has 
Tr(0) = P Q Tr(a°) + 0, Tr(a') + ^ Tr(a 2 ) + p 3 Tr(a 3 ) 

+ 0 4 Tr(a 4 ) + |3 S Tr(a 5 ) + 0 6 Tr(a 6 ) + 0 ? Tr(a 7 ) 

Hence it is only necessary to calculate trace values of a 0 , a 1 , 
a 2 , a 3 , a 4 , a 5 , a 6 , a 7 , the rest can be obtained easily once it is 
represented by the basis elements. 


i 
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